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Abstract. We have found that the effective survival time of amino acids 
in organisms follows a power law with respect to frequency of their oc- 
currence in genes. We have used mutation data matrix PAM1 PET91 to 
calculate selection pressure on each kind of amino acid. The results have 
been compared to MPM1 matrix (Mutation Probability Matrix) repre- 
senting the pure mutational pressure in the Borrelia burgdorferi genome. 
The results are universal in the sense that the survival time of amino 
acids calculated from the higher order PAMfc matrices {k > 1) follows 
the same power law as in the case of PAM1 matrices. 

1 Introduction 

Determining the evolutionary distances between two protein sequences requires 
the knowledge of the substitution rates of amino-acids. It is generally accepted 
that the more substitutions are necessary to change one sequence into another, 
the more unrelated they are and the larger their distance to the common an- 
cestor. The most widely used method for the calculation of distances between 
sequences is based on the mutation data matrix, My , published by Dayhoff ct 
al. [1], where i,j represent amino acids, and an element My of the matrix gives 
the probability that the amino acid in column j will be replaced by the amino 
acid in row i after a given evolutionary time interval. The interval corresponding 
to 1 percent of substitutions between two compared sequences is called one PAM 
(Percent of Accepted Mutations), and the corresponding matrix is denoted as 
PAM1 matrix. There is assumed a Markov model of sequence evolution and a 
simple power M k of the PAM1 matrix (multiplied by itself k times) denotes a 
matrix, PAM/c, that gives the amino acid substitution probability after k PAMs. 
Today, a much more accurate PAM matrix is available, generated from 16130 
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protein sequences, published by Jones et al. [2] . The large number of compared 
genes guarantees that the matrix has negligible statistical errors and it can be 
considered to be the reference matrix during the calculations of the phylogenetic 
distances. The matrix is also known as PET91 matrix. 

Recently, by comparing intergenic sequences being remnants of coding se- 
quences with homologous sequences of genes, we have constructed an empirical 
table of the nucleotide substitution rates in the case of the leading DNA strand 
of the B. burgdorferi genome [3], [4], [5]. We have found that substitution rates, 
which determine the evolutionary turnover time of a given kind of nucleotide 
in third codon positions of coding sequences, are highly correlated with the fre- 
quency of the occurrence of that nucleotide in the sequences. There is a compo- 
sitional bias produced by replication process, introducing long-range correlation 
among nucleotides in the third positions in codons, which is very similar to the 
bias seen in the intergenic sequences [6]. 

We have used the empirical table of nucleotide substitution rates to simu- 
late mutational pressure on the genes lying on the leading DNA strand of the 
B. burgdorferi genome and we have constructed MPM1 matrix (Mutation Prob- 
ability Matrix) for amino acid substitutions in the evolving genes. Thus the 
resulting table represents the percent of amino acid substitutions introduced by 
mutational pressure and not by selection. Next, we compared the survival times 
of the amino acids in the case without any selection with the effective survival 
times of the amino acids, counted with the help of the PAM1 PET91 matrix. 



2 Mutation Table for Nucleotides 

DNA sequence of the B. burgdorferi genome was downloaded from the website 
www.ncbi.nlm.nih.gov. The empirical mutation table for nucleotides in third po- 
sitions in codons, which we used in the paper, is the following [3], [4]: 



M - 



( 1 - uWa u Wat u Wag u Wag \ 
u Wta 1 — uWt u Wtg u Wtc 
u Wga u Wgt 1 - uWg u Wgc 

\ u Wca u Wct u Wcg 1 — uWc ) 



where 



(1) 



W G a = 0.0667 
Wat = 0.0655 
Wtc = 0.2613 



W G t 
W ac 
W CG 



0.0347 
0.0702 
0.0147 



W GC 
Wtg 
Wca 



0.0470 
0.1157 
: 0.0228 



Wag 
Wta ■ 
Wct 



■■ 0.1637 
0.1027 
0.0350 



(2) 



and the elements of the matrix give the probability that nucleotide in column j 
will mutate to the nucleotide in row i during one replication cycle. The symbols 
Wij represent relative substitution probability of nucleotide j by nucleotide i, and 
u represents mutation rate. The symbols Wj in the diagonal represent relative 
substitution probability of nucleotide j: 

1 The transpose matrix convention has been chosen in [3]. 
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and Wa + Wt + W G + W C = 1. 

The expression for the mean survival time of the nucleotide j depends on Wj 
as follows (derivation can be found in [3]) 

(4, 



ln(l-uWj) uW, 



3 



The above approximated formula is true for small values of the mutation rate u. 

In papers [3], [4], [5], we concluded that in a natural genome the frequency 
of occurrence fj of the nucleotides, in the third position in codons, is linearly 
related to the respective mean survival time Tj , 



fj = too tj + c , (5) 

with the same coefficients, too and Co, for each nucleotide. The Kimura's neutral 
theory [7] of evolution assumes the constancy of the evolution rate, where the 
mutations are random events, much the same as the random decay events of 
the radioactive decay. However, the linear law in (5) is not contrary to the 
Kimura's theory. Still, the mutations represent random decay events but they 
are correlated with the DNA composition. 



3 Mutational Pressure MPM1 Matrix Construction for 
Amino Acids 

We used computer random number generator to generate random deviates with 
a distribution defined by the elements of the mutation matrix in (1). With the 
help of the random deviates we were mutating nucleotide sequences of 564 genes 
from the leading DNA strand of the B. burdorferi genome. The applied value of 
the mutation pressure was u = 0.01. 

For each gene, considered to be an ancestral one in this simulated evolution, 
we prepared 10 5 pairs of homologous sequences, which diverged from this gene. 
The gene evolution was stopped when the number of substitutions between the 
homologous protein sequences reached 1%. All the sequences were translated into 
amino acids and we constructed a mutation probability matrix MPM1 according 
to the procedure of Dayhoff et al.[l] and Jones et al. [2]. The resulting mutation 
table, with substitution probabilities My, the amino acid mutability rrij, and 
the fraction fj of amino-acid in the compared sequences have been presented, 
respectively, in Table 1 and Table 2. 

The elements My of the MPM1 matrix in Table 1 have been scaled with the 
parameter A, which related them to the evolutionary distance of one percent of 
substitutions and it is equal to 0.00009731 in our simulations. We introduced the 
parameter A following the equation (3) in the paper by Jones et al.[2]). Therefore, 
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Table 1. Mutation Probability Matrix for an evolutionary distance of 1 PAM (splitted 
into two parts). Values of the matrix elements are scaled by a factor of 10 s and rounded 
to an integer. The symbols in the first row and the first column represent amino-acids 
and numbers following colons -number of codons representing a given amino-acid in 
the universal genetic code. 
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the diagonal elements of the MPM1 table have been defined with the help of the 
following formula, 

Mjj = 1 - Xmj, (6) 

and the value of the parameter A has been calculated from the condition that, in 
the case of 1% of amino acid substitutions, the total fraction of the unchanged 
amino acids is equal to 

20 

2/^=0.99. (7) 

3=1 



Table 2. Relative mutabilities and fractions of 20 amino acids in the compared se- 
quences. We used the convention that mutabilities are relative to alanine and it is 
arbitrarily assigned a value of 100. 



amino acid 


relative mutability (rrij) 


fraction (fj) 


A 


100.00 


0.0449 


R 


126.09 


0.0369 


N 


110.42 


0.0671 


D 


109.29 


0.0579 


C 


262.91 


0.0073 


Q 


77.67 


0.0206 
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89.88 


0.0644 
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0.0582 


H 


173.18 


0.0114 


I 


100.12 
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T 
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0.0047 
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0.0427 
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0.0645 



4 Discussion of results 

The major qualitative difference between the MPM matrix introduced in the 
previous section and the PAM1 PET91 matrix published in the paper by Jones 
et al. [2] is that the first one is a result of pure mutational pressure whereas the 
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second one is a result of both mutational and selection pressures. Thus, we have 
two evolutionary mechanisms responsible for the resulting PAM matrices. 

With the help of formula (4) (extended to amino acids) we have calculated 
effective survival times of amino acids in the case of the MPM1 matrix (Table 
1) and the mutational/selectional PAM1 PET91 matrix ([2]). The value of the 
parameter A is a counterpart of u in (4). In Fig.l, we presented the relation 
between the calculated survival time o amino acids and their fractions in the B. 
burgdorferi proteins, in the pairs of diverged genes, in a log-log scale. One can 
observe that the data are highly correlated and in both cases the dependence of 
the mean survival time of amino acid on the fraction of the amino acid represents 
a power law: 

Tj ~ Ff (8) 

with a negative value of a ~ —1.3 in the case of selection and a positive value of 
a « 0.3 in the case of mutation pressure on the leading DNA strand of the B. 
burgdorferi genome. The value of a for the analogous mutational PAM1 matrix 
calculated in the case of the lagging DNA strand of the B. burgdorferi genome 
is about twice as small. It is worth to underline that the slopes a are the same 
for the matrices PAM/c with high values of fc, and thus, they are universal with 
respect to evolution. 
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Fig. 1. Relation between survival time of amino-acids and their fractions in compared 
pairs of homologous genes in the case with selection and in the case without selection. 
Selection data for amino acids have been taken from PAM1 PET91 matrix whereas 
the data in the case without selection have been simulated using experimentally found 
mutational pressure of the B. burgdorferi genome. 
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Fig. 2. Gene length versus gene rank, where each gene has assigned a rank with respect 
to fraction of tryptophane (left graph) and with respect to leucine (right graph). In 
each graph there are two plots: the dots represent gene's size vs. rank, whereas the 
second plot represent the fraction vs. rank. 



In Fig.l we may observe a kind of evolutional scissors acting on amino acids. 
Once the less frequent amino acids, like W (tryptophane), C (cysteine), have 
much shorter turn over time compared with other amino acids (as can be seen 
from the lower line) the selection pressure (upper line) counteracts with the 
effect. On the other hand, the most mutable amino acids, like L (leucine) or / 
(isoleucine) , which are very frequent in genes, seem to be much weakly influenced 
by selection. 

In genes the fraction of the amino acids most protected by selection strongly 
depends on gene size, it is diminishing when gene's size is increasing. The effect 
weakens if we go into right the direction of the evolutionary scissors in Fig.l. 

To show this, we ordered all 564 genes under consideration with respect to 
fraction of an examined amino acid and the genes have been assigned a rank 
number. Next, we plotted the dependence of both the gene size on the rank and 
the dependence of the amino acid fraction in the gene on the rank. The result- 
ing plots in Figs. 2 correspond to two evolutionary extreme cases, representing 
tryptophan and leucine. It is evident that in the case of tryptophan the frac- 
tion of that amino acid in genes is anti-correlated with the gene's size (notice, 
that about 1/3 genes do not posses tryptophan). In the case of leucine there is 
a crossover and the effect of selection is evident only for the genes which have 
more than 10% of that amino acid. When the fraction of leucine is less than 
10% there is even a reverse effect, i.e., the increasing fraction is correlated with 
the increasing gene's size. If we look at the evolutionary scissors in Fig.l, we 
can see that in the case of leucine the survival time, which originates from pure 
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mutational pressure is longer than its selectional counterpart. Recently, there 
has appeared a paper by Xia and Li [8] discussing which amino acid properties 
(like polarity, isoelectric point, volume etc.) affect protein evolution. Thus, there 
is a possibility to relate these properties to our discussion of the evolutionary 
scissors. 

5 Conclusions 

We have shown that the amino acids which experience the highest selectional 
pressure have the shortest turn over time with respect to mutation pressure. The 
fraction of these amino acids in genes depends on the gene's size. Much different 
is the selectional role of the amino acids, like leucine, from the right hand side 
of the selectional scissors. Although they have long turn over time with respect 
to the mutational pressure, their fraction cannot be too high. This could be 
considered as an effect of optimisation of the genetic information on coding with 
processes of mutagenesis and phenotype selection for protein functions. 
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